Scalability of Shor's algorithm with a limited set of rotation gates 
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Typical circuit implementations of Shor's algorithm involve controlled rotation gates of magnitude 
tt/2 2L where L is the binary length of the integer N to be factored. Such gates cannot be implemented 
exactly using existing fault-tolerant techniques. Approximating a given controlled ir/2 d rotation gate 
to within 5 = 0(l/2 d ) currently requires both a number of qubits and number of fault-tolerant gates 
that grows polynomially with d. In this paper we show that this additional growth in space and time 
complexity would severely limit the applicability of Shor's algorithm to large integers. Consequently, 
we study in detail the effect of using only controlled rotation gates with d less than or equal to some 
d max . It is found that integers up to length L ma x = 0(4 dmax ) can De factored without significant 
performance penalty implying that the cumbersome techniques of fault-tolerant computation only 
need to be used to create controlled rotation gates of magnitude 7r/64 if integers thousands of bits 
long are desired factored. Explicit fault-tolerant constructions of such gates are also discussed. 



Shor's factoring algorithm 0,0 is arguably the driving 
force of much experimental quantum computing research. 
It is therefore crucial to investigate whether the algo- 
rithm has a realistic chance of being used to factor com- 
mercially interesting integers. This paper focuses on the 
difficulty of implementing the quantum Fourier transform 
(QFT) - an integral part of the algorithm. Specifically, 
the controlled ir/2 d rotations that comprise the QFT 
are extremely difficult to implement using fault-tolerant 
gates protected by quantum error correction (QEC). 

To factor an L-bit integer TV, a 2L-qubit QFT is re- 
quired that at first glance involves controlled rotation 
gates of magnitude ir/2 2L . Prior work on simplifying the 
QFT so that it only involves controlled rotation gates 
of magnitude 7r/2 dmax has been performed by Copper- 
smith Pj with the conclusion that the length L max of the 
maximum length integer that can be factored scales as 
0(2 dmax ) and that factoring an integer thousands of bits 
long would require the implementation of controlled ro- 
tations as small as 7r/10 6 . This paper refines this work 
with the conclusion that L max scales as 0(4 dmax ), with 
7r/64 rotations sufficient to enable the factoring of inte- 
gers thousands of bits long. 

The discussion is organized as follows. In Section [I] 
Shor's algorithm is revised with emphasis on extracting 
useful output from the quantum period finding (QPF) 
subroutine. This subroutine is described in detail in this 
section. In Section [H] Coppersmith's approximate quan- 
tum Fourier transform (AQFT) is described, followed by 
Section IlIII which outlines the techniques used to imple- 
ment the gate set required by the AQFT using only fault- 
tolerant gates protected by QEC. In Section \W\ the rela- 
tionship between the probability of success s of the QPF 
subroutine and the period r being sought is investigated. 
In Section^thc relationship between the probability suc- 
cess s and both the length L of the integer being factored 
and the minimum angle controlled rotation 7r/2 dmax is 

Sec- 



I. SHOR'S ALGORITHM 



Shor's algorithm factors an integer N — N1N2 by find- 



ing the period r of a function f(k) 



mod N where 



1 < m < N and gcd(m, N) — 1. Provided r is even and 
/(r/2) ^ N - 1 the factors are N x = gcd(/(r/2) + 1, N) 
and N 2 = gcd(/(r/2) — 1,N), where gcd denotes the 
greatest common divisor. The probability of finding 
a suitable r given a randomly selected m such that 
gcd(m, N) = 1 is greater than 0.75 Q|. Thus on aver- 
age very few values of m need to be tested to factor N. 

The quantum heart of Shor's algorithm can be viewed 
as a subroutine that generates numbers of the form 
j ~ c2 2L jr. To distinguish this from the necessary clas- 
sical pre-and postprocessing, this subroutine will be re- 
ferred to as QPF (quantum period finding). For physi- 
cal reasons, the probability s that QPF will successfully 
generate useful data may be quite low with many rep- 
etitions required to work out the period r of a given 



/(*) 



mod N. Using this terminology, Shor's al- 



tion IVII concludes with a summary of results. 



gorithm consists of classical preprocessing, potentially 
many repetitions of QPF with classical postprocessing 
and possibly a small number of repetitions of this entire 
cycle. This cycle is summarized in Fig^ 

A number of different quantum circuits implementing 
QPF have been designed [U El 0> HI • Table [I] summarizes 
the number of qubits required and the depth of each of 
these circuits. The depth of a circuit has been defined 
to be the minimum number of 2-qubit gates that must 
be applied sequentially to complete the circuit. It has 
been assumed that multiple disjoint 2-qubit gates can be 
implemented in parallel, hence the total number of 2- 
qubit gates can be significantly greater that the depth. 
For example, the Beauregard circuit has a 2-qubit gate 
count of 8L 4 to first order in L. Note that in general 
the depth of the circuit can be reduced at the cost of 
additional qubits. 

The underlying algorithm common to each circuit be- 
gins by initializing the quantum computer to a single 
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Select 1 < m < N 
such that gcd(m, N)=l 
(classical) 




Try to use j to find 
period r off(k) 

(classical) 



m k mod N 



Fail 



Success 



Test whether r is even and 
m r/2 mod N*±l mod N 
(classical) 



Fail 



Success 





= gcd(«i'" /2 -l, AO 


N 2 


= gcd(m r/2 +l,A0 




(classical) 



FIG. 1: The complete Shor's algorithm including classical 
pre- and postprocessing. The first branch is highly likely to 
fail, resulting in many repetitions of the quantum heart of 
the algorithm, whereas the second branch is highly likely to 
succeed. 



Circuit 


Qubits 


Depth 


Beauregard \7j 


2L 


32L* 


Vedral [5] 


5L 


240L 3 


Zalka [8] 


~ 50L 


~ 2 17 L 2 


Gossett [6] 


0(L 2 ) 


O(LlogL) 



TABLE I: Number of qubits required and circuit depth of dif- 
ferent implementations of Shor's algorithm. Where possible, 
figures are accurate to first order in L. 



pure state |0)2l|0)l. Note that for clarity the computer 
state has been broken into a 2L-qubit fc-register and an L- 
qubit /-register. The meaning of this will become clearer 
below. 

Step two is to Hadamard transform each qubit in the 
fc-register yielding 



I 

2L 



k=0 



\k)2L\0)l 



(1) 



Step three is to calculate and store the corresponding 
values of /(fc) in the /-register 



1 

2^ 



2^-1 

£ 

k=Q 



\k)2L\f{k))i 



(2) 



Note that this step requires additional ancilla qubits. 
The exact number depends heavily on the circuit used. 

Step four can actually be omitted but it explicitly 
shows the origin of the period r being sought. Measuring 
the /-register yields 



_ T u lr-x 
2T £ \ k o + nr) 2L \fAi)L 



(3) 



n=0 



where fco is the smallest value of fc such that /(fc) equals 
the measured value /m- 

Step five is to apply the quantum Fourier transform 



1^) ~~ * 2L E exp(^Jjfc)|j) 



(4) 



to the fc-register resulting in 

2 2L -l2 2L /r-l . 
^il E 2J eX P(^2L-?'( fc +Pr))\j)2L\fAl)L- (5) 

The probability of measuring a given value of j is thus 

2 



Pr(j, r,L) 



p=0 



(6) 



If r divides 2 2L Eq © can be evaluated exactly. In 
this case the probability of observing j — c2 2L /r for 
some integer 0<c<risl/r whereas if j ^ c2 2L /r the 
probability is 0. This situation is illustrated in Fig[2Ia). 
However if r divides 2 2L exactly a quantum computer is 
not needed as r would then be a power of 2 and eas- 
ily calculable. When r is not a power of 2 the perfect 
peaks of Fig HJa) become slightly broader as shown in 
Fig EJb) . All one can then say is that with high prob- 
ability the value j measured will satisfy j ~ c2 2L /r for 
some < c < r. 

Given a measurement j ~ c2 2L /r with c ^ 0, classical 
postprocessing is required to extract information about r. 
The process begins with a continued fraction expansion. 
To illustrate, consider factoring 143 (L = 8). Suppose 
we choose m equal 2 and the output j of QPF is 31674. 
The relation j ~ c2 2L /r becomes 31674 ~ c65536/r. The 
continued fraction expansion of c/r is 



31674 
65536 



1 



32768 
15837 



1094 
15837 



(J) 



14+- 



10 + 1/52 



The continued fraction expansion of any number between 
and 1 is completely specified by the list of denominators 
which in this case is {2, 14, 2, 10, 52}. The nth convergent 
of a continued fraction expansion is the proper fraction 
equivalent to the first n elements of this list. An intro- 
ductory exposition and further properties of continued 
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FIG. 2: Probability of different measurements j at the end of 



quantum period finding with total number of states 2 
and a.) period r = 8, b.) period r = 10. 



256 



FIG. 3: Circuit for a 4-qubit a.) quantum Fourier transform 
b.) approximate quantum Fourier transform with d max = 1 



fractions are described in Ref 0. 

{2} = 
{2,14} = 
{2,14,2} = 
{2,14,2,10} = 
{2,14,2,10,52} = 



1 

2 

14 

29 
29 

60 
304 

629 
15837 

32768 



(8) 
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ith each ] 
c m ' n > exists such that 
m i n i are divisors of r then 



The period r can be sought by substituting each denom- 
inator into the function f(k) = 2 k mod 143. With high 
probability only the largest denominator less than 2 L will 
be of interest. In this case 2 60 mod 143 = 1 and hence 
r = 60. 

Two modifications to the above are required. Firstly, 
if c and r have common factors, none of the denomina- 
tors will be the period but rather one will be a divisor 
of r. After repeating QPF a number of times, let {j m } 
denote the set of measured values. Let {c mn /d mn } de- 
note the set of convergents associated with each mea- 
sured value {jm}- If a pair c mn , 
gcd(c mn , Cmv) = 1 and d mn , d, 
r = lcm(d mn ,d m ' n >), where 1cm denotes the least com- 
mon multiple. It can be shown that given any two divi- 
sors d mn , dm'n' with corresponding c m „, c m / n / the prob- 
ability that gcd(c m „, c m 'n') = 1 is at least 1/4 Q. Thus 
only 0(1) different divisors are required. In practice, it 
will not be known which denominators are divisors so 
every pair d mn , d m >n> with gcd(c m „, c m >n') = 1 must be 
tested. 

The second modification is simply allowing for the out- 
put j of QPF being useless. Let s denote the probability 
that j — [c2 2L /r\ or \c2 2L /r~\ for some < c < r where 
[J , P| denote rounding down and up respectively. Such 
values of j will be called useful as the denominators of 
the associated convergents are guaranteed to include a 



divisor of r. The period r sought can always be found 
provided 0(1/ s) runs of QFT can be performed. 

To summarize, as each new value j m is measured, the 
denominators d m n less than 2 L of the convergents of 
the continued fraction expansion of j m /2 2i are substi- 



tuted into f(k) 



mod N to determine whether any 



f(dmn) — 1 which would imply that r — d mn - If n °t> 
every pair d mn , dm'n 1 with associated numerators c mn , 
Cm'n' satisfying gcd(c m „, c m 'n') = 1 must be tested to 
see whether r = lcm((f mTl , d m > n i). Note that as shown 
in Fig^ if r is even or m r / 2 mod N = ±1 mod N then 
the entire process needs to be repeated 0(1) times. Thus 
Shor's algorithm always succeeds provided 0(1/ s) runs 
of QFT can be performed. 



II. APPROXIMATE QUANTUM FOURIER 
TRANSFORM 



A circuit that implements the QFT of Eq J1J is shown 
in FiglSJa). Note the use of controlled rotations of mag- 
nitude ir/2 d . In matrix notation these 2-qubit operations 
correspond to 



/ 1 

10 

1 

\ e"/ 2 " J 



(9) 



The approximate QFT (AQFT) circuit is very similar 
with just the deletion of rotation gates with d greater 
than some d max . For example, Fig [3b) shows an AQFT 
with d max = 1. Let [j] m denote the mth bit of j. The 
AQFT equivalent to Eq gj) is 




Jj] m [fc]„2 m+ ") (10) 



4 



TIL 



7TL 



FIG. 4: Decomposition of a controlled rotation into single- 
qubit gates and a CNOT. 



where J2mn denotes a sum over all m, n such that < 



jran 

m.n < 2L and 2L 



rfmax + 1 < m + n < 2L. It has 



been shown by Coppersmith that the AQFT is a good 
approximation of the QFT [3j in the sense that the phase 
of individual computational basis states in the output of 
the AQFT differ in angle from those in the output of the 
QFT by at most 27rL2 _c ' max . The purpose of this paper 
is to investigate in detail the effect of using the AQFT in 
Shor's algorithm. 



III. FAULT-TOLERANT CONSTRUCTION OF 
SMALL ANGLE ROTATION GATES 

When the 7-qubit Steane code 0, 0, 0] and its con- 
catenated generalizations are used to do computation, 
only the limited set of gates CNOT, Hadamard (H), X, 
Z, S and S' can be implemented easily, where 



S 



1 

i 



(11) 



Complicated circuits of depth in the hundreds and requir- 
ing a minimum of 22 qubits are required to implement 
the T and gates 



T = 



1 

e"/ 4 



(12) 



Note however that if it is acceptable to add an addi- 
tional 15 qubits for every T and gate in a sequence 
of fault-tolerant single-qubit gates (see for example Eq 
|[T3J), the effective depth of each T and gate circuit 
can be reduced to 2. Together, the set CNOT, H, X, 
Z, S, S', T and T' enables the implementation of arbi- 
trary 1- and 2-qubit gates via the Solovay-Kitaev theo- 
rem 0,0]. For example, the controlled n/2 d gate can 
be decomposed into a single CNOT and three single-qubit 
rotations as shown in Fig 01 Approximating single-qubit 
7r/2 d rotations using the fault-tolerant gate set is much 
more difficult. For convenience, such rotations will hence- 
forth be denoted by R 2 d. The simplest (least number of 
fault-tolerant gates) approximation of the i?i28 single- 
qubit rotation gate that is more accurate than simply 



the identity matrix is the 31 gate product 

U 3 i = HTHT^HTHTHTHT^HT^HT 

HTHT^HT^HTHT^HT^HT^H. (13) 

Eq l|13fl was determined via an exhaustive search mini- 
mizing the metric 



dist(J7, V) 



2 - |tr(C/ty)| 



(14) 



The rationale of Eq (|14fl is that if U and V are similar, 
U'V will be close to the identity matrix (possibly up to 
some global phase) and the absolute value of the trace 
will be close to 2. By subtracting this absolute value 
from 2 and dividing by 2 a number between and 1 is 
obtained. The overall square root is required to ensure 
that the triangle inequality 



dist(C7, W) < dist(t7, V) + dist(V, W) 



(15) 



is satisfied. 

The identity matrix is a good approximation of R\2g, in 
the sense that dist(i?i 28 , /) = 8.7 x 1(T 3 . Eq JT3J) is only 
slightly better with dist(i?i 28 , U zl ) = 8.1 x 1CT 3 . A 46 
gate sequence has been found satisfying dist(_R 12 s, U±§) = 
7.5 x 10" 4 . Note that this is still only 10 times better 
than doing nothing. Further investigation of the proper- 
ties of fault-tolerant approximations of arbitrary single- 
qubit unitaries will be performed in the near future. For 
the present discussion it suffices to know that the number 
of gates grows somewhere between linearly and quadrat- 
ically with ln(l/<5) where 5 = dist(R,U), R is the 
rotation being approximated, and U is the approximat- 
ing product of fault-tolerant gates (the exact scaling is 
not known). In particular, this means that approximat- 
ing a rotation gate R 2 d with accuracy S — l/2 d requires 
a number of gates that grows linearly or quadratically 
with d. 

In addition to the inconveniently large number of fault- 
tolerant gates rig required to achieve a given approxi- 
mation S, each individual gate in the approximating se- 
quence must be implemented with probability of error 
p less than 0(6/ns). Note that S is not a probabil- 
ity of error but rather a measure of the distance be- 
tween the ideal gate and the approximating product so 
this relationship is not exact. If the required probability 
p ~ 5/ns = \/{n$2 d ) is too small to be achieved using a 
single level of QEC, the technique of concatenated QEC 
must be used. Roughly speaking, if a given gate can be 
implemented with probabilit y of error p, adding an addi- 
tional level of concatenation [13 leads to an error rate of 
cp 2 where c < 1 /p. If the Steane code is used with seven 
qubits for the code and an additional five qubits for fault- 
tolerant correction, every additional level of concatena- 
tion requires 12 times as many qubits. This implies that 
if a gate is to be implemented with accuracy l/(ns2 d ), 
the number of qubits q scales as 0(d ln212 ) = 0(d 3 58 ). 
While this is a polynomial number of qubits, for even 
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moderate values of d this leads to thousands of qubits 
being used to achieve the required gate accuracy. 

Given the complexity of implementing T and gates, 
the number of fault-tolerant gates required to achieve 
good approximations of arbitrary rotation gates and the 
large number of qubits required to achieve sufficiently 
reliable operation, it is clear that for practical reasons 
the use of 7r/2 d rotations must be restricted to those with 
very small d. 



IV. DEPENDENCE OF OUTPUT RELIABILITY 
ON PERIOD OF f{k) = m k mod N 

Different values of r (the period of f(k) = x k mod N) 
imply different probabilities s that the value j measured 
at the end of QPF will be useful (see Fig^l. In particular, 
as discussed in Section[I]if r is a power of 2 the probability 
of useful output is much higher (see Fig|2J). This section 
investigates how sensitive s is to variations in r. Recall 
Eq 10 for the probability of measuring a given value of 
j. When the AQFT (Eq JTJJl is used this becomes 



trolled rotation gates of the form 

/l 00 
10 
1 

,i(ir/2 d +S) j 



\0 



(18) 



where S is a normally distributed random variable of stan- 
dard deviation a. This has been included to simulate the 
effect of using approximate rotation gates built out of a fi- 
nite number of fault-tolerant gates. The general form and 
probability of successful output can be seen to be similar 
despite a = ir/32. This a corresponds to 7r/2 dmax+2 . For 
a controlled 7r/64 rotation, single-qubit rotations of angle 
7r/128 are required, as shown in Fig^J Fig^e) implies 
that it is acceptable for this rotation to be implemented 
within 7r/512, implying 



U = 



1 

gi(7r/128+7r/512) 



(19) 



is an acceptable approximation of Ri28- Given that 
dist(i?i28, U) = 2.1 x 10~ 3 , the 46 fault-tolerant gate 
approximation of i?i28 mentioned above is adequate. 



Pr(j, r, L, cU ax 

2 2L /r-l 

E 

p=0 



XL 

2 2L 



2m~ 
exp(^rE, 



(16) 



The probability s of useful output is thus 

s(r,L, d max ) — ^ Pr(j,r,L,d max ) (17) 

{useful j} 

where {useful j} denotes all j — [c2 2L /r\ or \c2 2L /r] 
such that < c < r. Fig El shows s for r ranging from 
2 to 2 L — 1 and for various values of L and d max . The 
decrease in s for small values of r is more a result of 
the definition of {useful j} than an indication of poor 
data. When r is small there are few useful values of 
j ~ c2 2L /r~\, < c < r and a large range states likely 
to be observed around each one resulting superficially in 
a low probability of useful output s as s is the sum of 
the probabilities of observing only values j — \c2 2L /r\ 
or \c2 2L /r~\, < c < r. However, in practice values 
much further from j ~ c2 2L /r can be used to obtain 
useful output. For example if r = 4 and j = 16400 
the correct output value (4) can still be determined from 
the continued fraction expansion of 16400/65536 which 
is far from the ideal case of 16384/65536. To simplify 
subsequent analysis each pair (L, d max ) will from now on 
be associated with s(2 L_1 +2, L, d max ) which corresponds 
to the minimum value of s to the right of the central peak. 
The choice of this point as a meaningful characterization 
of the entire graph is justified by the discussion above. 
For completeness, Fig|5{e) shows the case of noisy con- 



V. DEPENDENCE OF OUTPUT RELIABILITY 
ON INTEGER LENGTH AND GATE 
RESTRICTIONS 

In order to determine how the probability of useful 
output s depends on both the integer length L and the 
minimum allowed controlled rotation 7r/2 rfmax , Eq H17|) 
was solved with r = 2 L ~ X + 2 as discussed in Section llVl 
Fig |S] contains semilog plots of s versus L for different 
values of d max . Note that Eq l|17|) grows exponentially 
more difficult to solve as L increases. 

For d max from to 5, the exponential decrease of s 
with increasing L is clear. Asymptotic lines of best fit of 
the form 



s oc 



2-t/t 



(20) 



have been shown. Note that for d max > 0, the value 
of t increases by greater than a factor of 4 when d max 
increases by 1. This enables one to generalize Eq l|20|) to 
an asymptotic lower bound valid for all d max > 



s oc 



(21) 



with the constant of proportionality approximately equal 
to 1. 

Keeping in mind that the required number of repeti- 
tions of QPF is 0(l/s), one can relate L max to d max by 
introducing an additional parameter / max characterizing 
the acceptable number of repetitions of QPF 



L ~ 4 a 



_1 l0g 2 fn 



(22) 



Available RSA [Rj encryption programs such as PGP 
typically use integers of length L up to 4096. The circuit 
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in ja| runs in 150L 3 steps when an architecture that can 
interact arbitrary pairs of qubits in parallel is assumed 
and fault-tolerant gates are used. Note that to first order 
in L the number of steps does not increase as additional 
levels of QEC are used. Thus ~T0 13 steps are required 
to perform a single run of QPF. On an electron spin or 



charge quantum computer [T3 . [l5l | running at 10GHz this 
corresponds to ~15 minutes of computing. If we assume 
^24 hours of computing is acceptable then / max ~ 10 2 . 
Substituting these values of £ max and / max into Eq i|22|) 
gives dmax = 6 after rounding up. Thus provided con- 
trolled 7r/64 rotations can be implemented accurately, 
implying the need to accurately implement 7r/128 single- 
qubit rotations, it is conceivable that a quantum com- 
puter could one day be used to break a 4096-bit RSA 
encryption in a single day. 
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VI. CONCLUSION 
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FIG. 5: Probability s of obtaining useful output from quan- 
tum period finding as a function of period r for different in- 
teger lengths L and rotation gate restrictions 7r/2 dmax . The 
effect of using inaccurate controlled rotation gates (a = 7r/32) 
is shown in (e). 



We have demonstrated the robustness of Shor's al- 
gorithm when a limited set of rotation gates is used. 
The length L max of the longest factorable integer can 
be related to the maximum acceptable runs of quantum 
period finding / max and the smallest accurately implc- 
mentable controlled rotation gate 7r/2 dmax via £ max ~ 
4dmax-i \ g 2 J max . Integers thousands of digits in length 
can be factored provided controlled 7r/64 rotations can 
be implemented with rotation angle accurate to 7r/256. 
Sufficiently accurate fault-tolerant constructions of such 
controlled rotation gates have been described. 
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FIG. 6: Dependence of the probability of useful output from the quantum part of Shor's algorithm on the length L of the 
integer being factored for different levels of restriction of controlled rotation gates of angle 7r/2 dmax 
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